perm filename FOO.BAR[VLI,LSP] blob
sn#377773 filedate 1978-09-08 generic text, type T, neo UTF8
00050 @device(xgp)
00100 @make<report>
00200 @Counter(SubParagraph,Within Paragraph,TitleEnv HD4,ContentsEnv tc4,
00300 Numbered [#1.],Referenced [#.1],IncrementedBy Use,Break)
00400
00450 @PageHeading(left "@c[Oleinick's Thesis]", right "@c[Preliminary]")
00500 @Chapter<Introduction>
00600 The purpose of this research is to demonstrate how to write parallel
00700 programs that effectively use the multiple computers in a
00800 multiprocessor. Developing strategies for incorporating parallelism
00900 into algorithms has been an area of intense interest for quite some
01000 time, e.g., [Avriel and Wilde 66], [Karp and Miranker 68], [Rosenfeld and Driscoll 69],
01050 [Heller 76], [Hyafil and Kung 76], [Thompson and Kung 76], [Baudet, Brent and Kung 77]
01075 and [Baudet 78]. However, until very recently,
01100 only simulation and analysis techniques were available for
01200 demonstrating the effectiveness of a parallel algorithm.
01300
01400 With the emergence of multiprocessor computer systems that provide
01500 users with the facilities for constructing parallel algorithms[CM*
01600 and C.mmp], the verification of an algorithm's performance is in its
01700 implementation. Initial attempts to demonstrate the performance of
01800 a simple parallel algorithm [Fuller 75] yielded unexpectedly large
01900 degradations in the algorithm's performance. These degradations were
02000 not the result of an error or inefficiency in decomposing the problem
02100 into cooperating processes. Rather, several non-algorithmic sources
02200 were determined to be the source of the degradations. This result
02205 indicates that in order to develop effective parallel algorithms for
02207 multiprocessors it is necessary to be aware of the target machine's
02208 performance characteristics.
02220
02500 Presently, the
02600 task of writing effective parallel software is an ad-hoc
02700 procedure of constructing code for a unique machine. Since
02800 multiprocessors are almost as different from one another as they are
02900 from uniprocessors, it is difficult to apply insight gained from writing
03000 parallel software for one multiprocessor to another
03100 machine. However, by documenting the performance of various implementations
03200 of several algorithms on one machine, we can demonstrate the
03300 effectiveness of various strategies at capturing parallelism.
03400
03500 One style of parallel programming for multiprocessors involves tightly
03600 coupled cooperating processes. Several decomposition strategies exist
03650 that use this approach, among them @i[pipelining] and @i[partitioning]
03675 [Jones 78]. In both cases, simultaneously
03700 executing processes must interact frequently. Since inter-process
03800 communication constitutes an overhead, tightly coupled systems exhibit
03900 performance degradations proportional to the amount of process
04000 interaction among the processes. Thus, in order to maintain high
04100 performance, one must reduce both the overheads of inter-process
04200 communication and the amount of process interaction.
04300
04400 The user has little power to reduce the overhead of
04500 inter-process communication. Since processes are created and
04600 maintained by the operating system, inter-process communication is
04700 permitted in only a few well defined ways. The user is given a
04800 selection of primitives provided by the operating system, with which he can
04900 build his own communication mechanisms. However, the
05000 performance of the communication mechanism is directly influenced
05100 by the performance of the operating system's primitive.
05200
05300 However, writing
05400 effective parallel software requires an awareness of more than just
05500 the overheads involved in inter-process communication.
05600 We adopted the following two phase strategy for uncovering the major
05700 influences on performance:
05800 @begin(enumerate,group)
05900
06000 Develop a simple parallel algorithm as a vehicle for
06100 conducting a performance study on C.mmp.
06200
06300 Use this test program to measure the effects on performance
06400 stemming from both the hardware and the operating system.
06500 @end(enumerate)
06505 A brief introduction to the C.mmp environment, both hardware and
06510 operating system is contained in chapter two. In addition, chapter
06515 two contains the development and theoretical performance calculations
06520 of the rootfinding algorithm.
06525
06600 The investigation into the sources of performance perturbation is
06700 presented in chapter three.
06800
06805 Since synchronization is a fundamental parallel programming issue,
06810 chapter four is devoted entirely to studying the effects of
06815 synchronization on performance. The performance of various
06820 synchronizaton primitives is conducted at two levels: the speed
06825 in performing the basic synchronization operations and the impact
06830 each primitive has on the performance of the rootfinding algorithm.
06835
06900 In chapter five, we apply the insights gained from the
07000 initial investigation toward developing a technique for efficiently
07100 implementing tightly coupled complex systems. This technique involves
07200 decomposing a problem into @i[task forces][Jones 77] of cooperating
07300 processes.
07400
07500 Chapter six contains the implementation and evaluation of a complex
07600 task-- the HARPY speech recognition system.
07700 An initial decomposition of
07800 the algorithm is successively refined in three implementations. In
07900 each iteration, some aspect of performance is improved. This incremental
08000 enhancement of the algorithm enables us to measure the performance
08100 improvement contributed by each enhancement.
08105
08110 Chapter seven contains a summary of the investigations into the
08115 implementation of parallel algorithms. In the conclusion, the major results and
08120 contibutions of the thesis are summarized.